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TUSNADY'S INEQUALITY REVISITED 

By Andrew Carter and David Pollard 

University of California, Santa Barbara and Yale University 

Tusnady's inequality is the key ingredient in the KMT/Hungarian 
coupling of the empirical distribution function with a Brownian bridge. 
We present an elementary proof of a result that sharpens the Tusnady 
inequality, modulo constants. Our method uses the beta integral rep- 
resentation of Binomial tails, simple Taylor expansion and some novel 
bounds for the ratios of normal tail probabilities. 

1. Introduction. In one of the most important probability papers of the 
last forty years, Komlos, Major and Tusnady (1975) sketched a proof for a 
very tight coupling of the standardized empirical distribution function with 
a Brownian bridge, a result now often referred to as the KMT, or Hungar- 
ian, construction. Their coupling greatly simplifies the derivation of many 
classical statistical results — see Shorack and Wellner [(1986), Chapter 12 et 
seq.], for example. 

The construction has taken on added significance for statistics with its 
use by Nussbaum (1996) in establishing asymptotic equivalence of density 
estimation and white noise models. Brown, Carter, Low and Zhang (2004) 
have somewhat simplified and expanded Nussbaum's argument using our 
Theorem 2, via inequality (5). 

At the heart of the KMT method [with refinements as in the exposition 
by Csorgo and Revesz (1981), Section 4.4] lies the quantile coupling of the 
Bin(n, 1/2) and N{n/2,n/4) distributions, which may be defined as follows. 
Let y be a random variable distributed N(n/2,n/4). Find the cutpoints 
— oo = Po < I3i < ■ ■ ■ < (3n < Pn+i = oo for which 

P{Bin(n,l/2) >k}=F{Y>(3k} for A: = 0, 1, . . . , n. 
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When f3k <Y < Pk+i, let X take the value k. Then X has a Bin(n,l/2) 
distribution. 

It is often more convenient to work with the tails of the standard normal 
^{z) = P{iV(0, 1) > z} and the standardized cutpoint Zk = 2{Pk - n/2)/^/n, 
thereby replacing F{Y > f3k} by ^{zk)- 

Symmetry considerations show that Pn-k+i = n — P^, so that it suffices 
to consider only half the range for k. More precisely, when n is even, say 
n = 2m, the interval {Pm, Pm+i) is symmetric about n/2, so we have only 
to consider fe>m + l = (n + 2)/2. When n is odd, say n = 2m + 1, the 
interval {Pm, Pm+2) is symmetric about n/2 = Pm+i, so we have only to 
consider k > m + 2 = {n + 3) /2. 

The usual normal approximation with continuity correction suggests that 
k — 1/2, which, if true, would bound |X — y| by a constant that does 
not change with n. Of course, such an approximation for all k is too good to 
be true, but results almost as good have been established. The most elegant 
version appeared in the unpublished dissertation (in Hungarian) of Tusnady 
(1977), whose key inequality may be expressed as the assertion 



As explained by Csorgo and Revesz [(1981), Section 4.4], Tusnady's inequal- 
ity implies that \X - n/2\ <\Y - n/2\ + 1 and |X - F] < 1 + Z^/S, where 
Z denotes the standardized variable {2Y — n) /\/n. They also noted that 
Tusnady's proof of inequality (1) was "elementary," but "not at all sim- 
ple." Bretagnolle and Massart [(1989), Appendix] published another proof 
of Tusnady's inequality — an exquisitely delicate exercise in elementary cal- 
culus and careful handling of Stirling's formula to approximate individual Bi- 
nomial probabilities. With no criticism intended, we note that their proof is 
quite difficult. More recently, Dudley [(2000), Chapter 1] and Massart (2002) 
have reworked and refined the Bretagnolle/Massart calculations. Clearly, 
there is a continuing perceived need for an accessible treatment of the cou- 
pling result that underlies the KMT construction. 

With this paper we offer another approach, which actually leads to an 
improvement (modulo constants) of the Tusnady inequality. In fact, the 
Tusnady upper bound greatly overestimates (3^ for moderate to large k. 
(See below.) Our method differs from that of Bretagnolle and Massart, in 
that we work directly with the whole tail probability. Our method is closer 
to that of Peizer and Pratt (1968), who suggested a Cornish-Fisher expan- 
sion of the Binomial percentiles — but, as noted by Pratt [(1968), Sections 5 
and 8], a rigorous proof by this method is difficult. To avoid the difficulty, 
Molenaar [(1970), Section III. 2] made a more direct calculation starting from 
the representation of the Binomial tail as a beta integral. 
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He indicated that his expansion would be valid provided \k — n/2\ = 0{y/n). 
Pratt seemed to be claiming validity for his expansion for the range \k — 
n/2\ = o{n), but we believe extra work is needed for \k — n/2\ large. 

We should point out that Peizer, Pratt and Molenaar were actually con- 
cerned with normal approximations to distributions more general than the 
Bin(n, 1/2) case needed for the KMT construction. We have specialized their 
results to this case. 

Our method also starts from the integral representation (2), to derive 
an approximation via Laplace's method for integrals [de Bruijn (1981), Sec- 
tion 4.3] using only Taylor's theorem and Stirling's formula [Feller (1968), 
Section II. 9] 

n! = V2tt exp{{n + ^) logn — n + A„,) 

with (12n-M)~i < A„ < (12n)~^ 

In fact [Komlos, Major and Tusnady (1975), page 130], the KMT con- 
struction only needs a result like the Tusnady inequality for values of /c in a 
range where |2A; — n| < Equ for some fixed eo < 1. For that range, a suitable 
bound can be derived from classical large deviation approximations for Bi- 
nomial tails. For example, in an expanded version of the argument sketched 
in the 1975 paper, Major (2000) used the large deviation approximation 

F{X >k} = ^{e^/n)exp{An{e)) where e = (2fc - n)/n, 

with 

|A„(e)| = 0{ne^ + n~^/^) uniformly in < e < eo < 1- 

Mason (2001) derived the KMT coupling from an analogous approximation 
with 

An{e) = ne^X{e) + 0{e + n~^/^) uniformly in < e < eo < 1, 

where A(-) is a power series whose coefficients depend on the cumulants 
of the Binomial distribution. Such an approximation follows from a minor 
variation on the general method explained by Petrov [(1975), Section 8.2]. 
Symmetry of the Bin(n, 1/2) makes the third cumulant zero; the power series 
e^A(e) starts with a multiple of e^. 

Our method gives a sharper approximation to the Bin(n, 1/2) tails over 
the range n/2 < k < n — 1 (which, by symmetry, actually covers the range 
<k <n). Only at the extreme, k = n, does the calculation fail. 

Theorem 1. Let X have a Bin(n, 1/2) distribution, with n > 28. Define 
^l^j ^ (l+e)log(l+e) + a-.)log(l-e)-e^ = f;^"/(2^ + 3)(2r + 4), 

r=0 
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an increasing function with 7(0) = 1/12 and 7(1) = —1/2 + log2 ~ 0.1931. 
Define e = {2K — N)/N, where K = k — 1 and N = n — 1. Define A„ as 
in (3). Then there is a constant C such that 



P{X > A;} = ^{e^N ) exp(^„,(e)), 

where 

An{e) = -NeS{e)-\\o^{l-e^)-\n-k + rk and -C\ogN<Nrk<C 
for all e corresponding to the range n/2 < k < n — 1. 

Notice that the Xn-k can be absorbed into the error terms, and that 
log(l — e^) is smah compared with A^e^ + 0{n~^), when e < eo < 1- 

A very precise approximation for the cutpoints Pk fohows from Theorem 1 
inequahties (see Section 3) for the tails of the normal distribution. 

Theorem 2. Let Zk = 2{f5k-n/2)/^ and e = {2K - N)/N . Let S{e) = 
Vl + 2e27(e) for 7(e), as in Theorem 1. Then, for some constant C and 
n>28, 

r- log(l-e2) + 2A^ 
Zk = ev N b[e) H = h uk 

with — C"(e\/iV + 1) < N9k < C'{eVN + log A^) for all e corresponding to 
the range n/2 < k < n — 1. 

For example, the theorem imphes — A; + 1/2 = o(l) uniformly over a 
range where \k — n/2\ = o{n'^/^). Also, when e <£q < 1/2, the log term can 
be absorbed into the 0{e/ y/n) errors. Even when k gets close to n — 1, 
the log term contributes only an 0(n~^/^ log n) to the approximation. More 
precisely, \i k = n — B for a fixed -B > 1 , our approximation simplifies to 

(4) /3„_B = ^^n-i^^logn + 0(l) where c = 5(1) w 1.177, 
2 4c 

which agrees up to 0(1) terms with the result obtained by direct calculation 
from 

F{X>„-B}=((^)+...+ (2))2-" = ^(l + „a)) 
and the well-known approximation for normal percentiles, 

^~^{p) =y ^ + 0{l/y) as p — > 0, where y = A/21og(l/p). 

y 

By contrast, the upper bound for f3n-B from (1) is about 0.088n too large. 
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It is also an easy consequence of Theorem 2 that there exist positive 
constants d for which 



(5) 



n v? 2 \/n v? 



for n/2 < k < n and all n. For the quantile coupling between an X dis- 
tributed Bin(n,l/2) and a Y = n/2 + ^/nZ/2 distributed N{n/2,n/4), it 
follows that there is a positive constant C for which 
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and \X -Y\<C + ^. 
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Using the fact that \X — n/2| < n/2, we could also write the upper bound 
for \X — Y\ as a constant multiple of 1 + Z^(l A |Z|/-y/n), which improves 
on Tusnady's 1 + modulo multiplicative constants. (We have made no 

attempt to find the best constants, even though, in principle, explicit values 
could be found by our method.) 

2. Outline of our method. As in Theorem 1, write e = {2K — N)/N, 
where K = k — 1 and N = n — 1. Then K/N = (1 + e)/2 and the range 
n/2 < k < n corresponds to 

2iV-i 



(6) 



1 



N 



>e 



2K 



1 > 



when n is even, 
when n is odd. 



Define 2H{t) = (1 + e)logt + (1 - e)log(l - t) for < t < 1. Representa- 
tion (2) can then be rewritten as 



¥{X > k] 



K\{N -K)\ 



.1/2 

/ exp(Klogt + (iV-K)log(l-t))dt 

JO 

1/2 



K\{N -K)\ Jo 
By Stirling's formula (3), 
iV! 



1 



4A^ 



K\{N-K)l NV27r{l-e'^ 



■ exp{A - NH{K/N)) 



where A := Aat — Aa' — Xn-k- 



Thus, the beta integral equals 



n 

N 



exp (a - i log(l -e')- NH{K/N)^ J ^ fj' 



The function H{-) is concave on (0,1). It achieves its global maximum 
at K/N, which lies outside the range of integration. On the interval (0, 1/2] 
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the maximimum is achieved at 1/2. On the range of integration, H{t) — 
H[K/N) is never greater than 

H{l/2) - H{K/N) = - + e) log(l + e) - i(l - e) log(l - e) 

The concave function h{s) := H{{1 — s)/2) — H{\/2) achieves its maximum 
value of zero at s = and 

V 2vr Jo 

where A = log(l + N'^) + log(l - e^) _ Ne'^-fie). 

The A contributes 0(l/n) - Xn-k - \ ^og{l - e^) - Ne^-f{£) to the A„(e) 
from Theorem 1. Taylor's expansion of h{s) about ,5 = and concavity of h{-) 
show that the exponent Nh[s) drops off rapidly as s moves away from zero. 
Indeed, 

his) = -es - + ls^h"'is*) with < s* < s 

(8) 2 6 V ; 

~ — ^(s + e)^ for s near zero. 

See Section 4 for the more precise statement of the approximation. 

Most of the contribution to the integral (7) comes from s in a small 
neighborhood of 0. Ignoring tail contributions to the integral, we will then 
have 

(9) ¥{X>k}^e^^^J\xp(^-^N{s + £f^ ds = e^^eVN), 

as asserted by Theorem 1. 

To derive Theorem 2 we perturb the argument e^/N slightly to absorb 
the factor exp{An{e)). We seek a y for which 



<^{eVN + y) « exp(^„(e))$(eViV) = 

That is, we need 

'l(e\/iV +y)/^(eViV) «exp(-A^eS(e) - ^ log(l - e^)). 

As shown in the next section, the ratio of normal tail probabilities $(x + 
y)/^{x) behaves like exp(— xy — y^/2), at least when x is large. Ignore 
the logarithmic term for the moment. Then the heuristic suggests that we 
choose y to make e\/Ny + y^/2 A^e^7(e), that is. 
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and, hence, 

Zk ~ eVn + e^/N v/l + 2e%(e). 

For the rigorous proof of Theorem 2 we need to replace these heuristic 
approximations by inequahties giving upper and lower bounds for ^{z^), 
then invoke the inequalities for normal tails derived in the next section. 



3. Tails of the normal distributions. The classical tail bounds for the 
normal distribution [cf. Feller (1968), Section VII. 1 and Problem 7.1] show 
that ^(x) behaves roughly like the density (f){x): 

(- - ■\^(f){x) < ^{x) < -(t){x) 
(10) \x x-^J X forx>0. 

^(x) < iexp(-xV2) 

The first upper bound is good for large x, the second for x ~ 0. For the proofs 
of both Theorem 1 (in Section 4) and Theorem 2 (in Section 5), we will need 
to bound the ratio $(x + y)/$(x). It is possible to derive suitable bounds 
directly from (10), but we have found it easier to work with inequalities 
that interpolate smoothly between the different cases in (10). We express 
our results in logarithmic form, using the function ^'(x) := — log^(x) and 
its derivative 

Pi^) = -i-^i^) = 0(x)/^(x). 
ax 

To a first approximation, the positive function p(x) increases like x. By 
inequality (10), the error of approximation, r(x) := p{x) — x, is positive for 
X > and, for x > 1, 

X 

r(x) < — = 0(l/x) asx^oo. 

x"' — 1 

In fact, as shown by the proof of the next lemma, p(-) is increasing and r(-) 
is decreasing and positive on the whole real line. 



Lemma 1. The function p{-) is increasing and the function r(-) is de- 
creasing, with r{oo) = p{—oo) = and r(0) = p{0) = 2/\/27r ~ 0.7979. For all 
X G M and 5>0, the increments of the function ^'(x) := — log^(x) satisfy 
the following inequalities: 

(i) 6p{x) < ^'(x + 6)- ^'(x) < 5p{x + 6), 

(ii) 5r{x + 6)< ^'(x + 6)- ^'(x) - ^{x + 5)'^ + \x^ < br{x), 

(iii) xb + i<52 < ^(x + 5) - ^'(x) < p{x)b + 
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Proof. Let Z be 7V(0, 1) distributed. Define M(x) = Pe"^l^' , a decreas- 
ing function of x with logM(x) strictly convex. Notice that 



roo 

i)(z + x) dz 



nc 

l/p{x) = V2^exp{x^/2) / 

Jo 

= / eyip{-xz- z^/2)dz = J-M{x). 
Jo V 2 

Thus, — logAf(a;) — log \/tt/2 = logp{x) = ^'(x) — — log-y/27r is a con- 
cave, increasing function of x with derivative p{x) — x = r{x). It follows that 
r(-) is a decreasing function, because 

d"^ 

r'(x) = — log Mfx) < by convexity of \ogM(x). 
dx'^ 

Inequality (i) follows from the equality 

^{x + 6) - ^{x) = d^'{y*) = 5p{y*) for some x<y* <x + 6, 

together with the fact that p{-) is an increasing function. Similarly, the fact 
that 

— ("^(y) = p(y) — y = r(y) which is a decreasing function 

dy\ 2 J 

gives inequality (ii). Inequality (iii) follows from (ii) because 6r(x + S) >0 
and x6 + r{x)6 = p{x)6. □ 

Reexpressed in terms of the tail function the three inequalities from 
the lemma become: 

(i) exp(-(5p(x)) > ^{x + 6)/^{x) > exp{-6p{x + 6)), 

(ii) exp(-(5r(x + 6)) > exp{x5 + 6'^/2)^{x + 6)/^{x) > exp(-Jr(x)), 

(iii) expl-x6-6^/2)>^x + 5)/^{x)>exp{-p{x)5-6y2). 

Less formally, 

¥{Z <x + S\Z<x} = l-^{x + 6)/^{x) « 6p{x) for small S, 

which corresponds to the fact that p is the hazard rate for the A^(0, 1) 
distribution. 

4. Details of the proof for Theorem 1. To make the proof rigorous, we 
need to replace the approximation in the Taylor expansion (8) by upper and 
lower bounds involving the third derivative 

h"'( ) = ^"^ _ 1 + g ^ 6g + 2g^ + £(2 + 6s^) 

^""^"(1+5)3 (1-S)3~ (1-^2)3 
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The derivative of this function is negative for aU s. Thus, 
h"'{s)<h"'{0) = -2e for < s < 1 

and 

h{s) < - ^{s + forO<s<l. 

The right-hand side of the approximation (9) is actually an upper bound, 
because the integrand is nonnegative on (l,oo). That is, 

F{X >k} <e^^{eVN), 

which gives the upper bound for An{e) stated in the theorem. 

For the lower bound, for some small positive r] discard the contribution 
to the integral in (7) from the range (rj,!), and bound h'" from below by 
h"'{rf) on the range (0,r/), then integrate to get 

^{X >k}> e^J— r exp(--N{s + ef + -Nr]s'^h"' (rj)] ds 
V 27r Jo V 2 6 / 

J\xp(^-^NK'^{s + e/K'^f + ^Ne'^/K^ - ds 

= — exp (-Ne'^/k'^ - -NeA {^{eVn /k) - ^{e^/N /k + Kr]^/N)), 
/i^ \ 2 ^ J 

where 

K^ = l- i?7/i'"(77) < l + 6r/(77 + e) ifr/<i. 
From Lemma 1, parts (iii) and (ii), 

i(e\/]V Ik + Kr]^fN ) < ^(e//V /k) exp(-iVe?7 - ^Nk^t]"^) 

and 

exp(iiVe2)$(e\/iV) < exp{^Ne^ /k^)^{£^ /k). 

Thus 

(11) ¥{X >k}> exp(A - logK)^(eViV)[l - exp(-iVe?? - ^NK^rj^)]. 

We need log k = 0{£n), where ij\f = log A^, for otherwise the asserted in- 
equality — Clog < Tfc would be violated. As logn < Q{'tf' + rie), this require- 
ment suggests that we take r/ as a solution to the equation ^rf + rje = In, 
that is,r] := -e + y/e'^ + 2£n- We would then have < 1 + 12^Ar and r/ < 1/2, 
at least for n > 28. Also, the exponent —Ner] — ^NK^rj"^ is smaller than 
— log A^, which ensures that the final, bracketed term in (11) only contributes 
another 0{N~^) to the An{e) from Theorem 1. 
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5. Details of the proof for Theorem 2. Written using the ^ function 
from Lemma 1, the assertion of Theorem 1 implies that 



^{zk) = ^{eVN) + Bn{e) + Tk, 
where, for some constant C, 

Bn{e) = NeSie) + ^logil - e^) + Xn^k and -CN-'<Tk<CiN 
for e corresponding to the range n/2 > k < n — 1, that is, for < e < 1 

Define 

-2^ 



r- log(l-£^) + 2V 

^ ^ 2eVNS{£) 



We need to show that there is a constant C' for which Zk = Wk + Ok, with 
-C'{eVN + 1) < N9k < C'{£y/N + logTV) for < e < 1 - 2N-^. Consider 
two cases. 



5.1. Suppose e < Cq/vN for some constant Cq. Uniformly over that 
range Bn{e) = 0{N~^) and Wk = eVn + 0{N^^). From Lemma l(i), for all 
nonnegative 6i and 82, 

^{x) + 6ip{x)<'^{x + 6i) and ^{x - 82) + d2pix - 82) < ^{x). 

With X equal to e\/]V and Ci a large enough constant, deduce that 

^(eViV - CiiV~^) < ^(zfc) < *(eViV + Ci£n) 

and, hence, 

Wk - 0{N-^) - Clival <Zk< Wk + 0{N-^) + Ci£n- 

5.2. Suppose Cq/VN < e < 1 - 2N~^ . Write x for e^/iV and p for S„(e) + 
Tk = "^{zk) — "^{x). For all e in this range, if Cq is large enough, we have 
P > and r{x) < 2/x. The function h{t) = t — Vt"^ + 2/3 is negative, in- 
creasing and concave, with h'{t) < The positive numbers 61 = —h{x) 
and 62 = —h{p{x)) are roots of two quadratic equations. Six + ^df = (3 = 
82p{x) + i(^2- From Lemma l(iii). 



2 

^(zfc) - -^{x) = x5i + \5l < ^{x + 5i) - ^f{x), 
^{x + (^2) - "^{x) < p(x)52 + = ^{zk) - ^(x). 



which imply that x + 62 < z/^ < x + 61. These bounds force Zk to lie close to 
X + 61: 

0<x + Si- Zk<6i-62 = h{p{x)) - h{x) < r{x)h'{x) < Ap/x^ = 0{e/y/N). 
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And X + 5i lies close to Wk ■ 



x + Si = VNe^ + 2/3 

= eVNS{e)(l + 



log(l-e2) + 2A„„fc + rfc 
- + 0(ViV4). 



) 



1/2 




The assertion of Theorem 2 follows. 
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